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THE  EFFECTS  OF  RATER  STRESS 
ON  PERFORMANCE  RATING  ACCURACY 

Performance  appraisal  systems  ultimately  rely  on 
judgments  about  an  individual  behavior  made  by  one  or  more 
other  persons.  Many  years  of  research  on  performance 
appraisals  indicate  that  these  judgments  are  vulnerable  to 
bias  or  distortion  .  A  recent  trend  in  performance  rating 
research  is  to  address  the  cognitive  processes  that  are 
involved  when  raters  obtain  information  about  and  make 
ratings  of  a  person's  performance. 

Cognitive  Categorisation  In  Performance  Appraisal 

Cognitve  models  of  performance  rating  have  been 
presented  by  DeNisi,  Cafferty,  and  Meglino  (1984),  Ilgen  and 
Feldman  (1983),  and  Landy  and  Farr  (1983).  The  common 
characteristic  of  these  models  is  that  the  rater  is  an  active 
seeker  of  information  who  processes  information  in  a  series 
of  cognitve  operations--first,  observing  the  behavior  or 
other  cues  that  supply  information  concerning  the  ratee's 
performance,  encoding  this  information,  storing  it  in  memory, 
and  finally,  retrieving  it  when  the  time  comes  to  make  an 
evalution  of  the  performance. 

One  process  thought  basic  to  the  observation  and 
encoding  of  performance  information  is  categorization,  an 
automatic  coding  of  people  in  terms  of  certain  common 
characteristics  in  a  non-thinking  or  automatic  way  (Feldman, 
1981).  Categorization  allows  the  rater  to  reduce  the 
complexity  and  amount  of  performance  information  processed. 
Rosch  and  Mervis  (1975)  refer  to  "family  resemblances”  in 
defining  the  nature  of  categories.  Each  member  shares  some 
attributes  with  some,  but  not  all,  of  the  other  members,  the 
level  of  family  resemblance,  or  typicality,  of  a  member 
depends  on  the  number  of  attributes  it  shares  with  other 
members.  Some  members,  having  greater  typicality,  are  better 
examples  of  a  category  than  are  others. 

This  is  similar  to  the  prototype  approach  to 
categorization  of  Cantor  &  Mischel  (1979).  They  maintain 
that  we  develop  prototypes  (abstract  knowledge  structures 
summarizing  family  resemblances  among  category  members)  as  a 
means  of  grouping  persons.  Such  prototypes  allow  one  to 
organize  knowledge  about  the  probable  behavior,  attitudes, 
and  other  attributes  of  particular  individuals.  For  example, 
the  prototypic  rock  musician  may  be  loud,  irresponsible, 
promiscuous,  a  drug  user,  and  a  wild  dresser.  Therefore 
anyone  labelled  "rock  musician”  is  likely  to  be  automatically 
perceived  as  engaging  in  these  behaviors  and  holding  similar 
attitudes.  This  simplifies  and  reduces  the  need  to  learn, 
store,  and  recall  information  about  individuals. 
Categorization  and  the  Rating  process 

When  a  rater  is  required  to  make  a  categorical  judgment 
(e.g.,  "good  worker")  on  the  basis  of  limited  knowledge  of 
the  ratee,  the  rater  may  search  for  particular,  prototypic 
category  attributes  (e.g.,  punctuality,  loyalty)  and  for  the 
extent  to  which  these  attributes  are  consistently  displayed 
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by  the  ratee.  If  the  ratee  is  perceived  to  possess  such 
prototypic  attributes,  he  or  she  is  likely  to  be  assumed  to 
possess  other  category  attributes  as  well. 

Zlgen  and  Feldman  (1983)  maintain  that  the  performance 
appraisal  task  is  better  characterized  as  a  memory-based 
judgment  process  than  a  stimulus-based  one.  Supervisors  are 
often  distracted  during  observations  of  subordinates  and  must 
often  rely  on  categorizations  that  they  have  formed  of  the 
person,  not  unique  features  of  current  performance.  The 
recalled  behavior  may  be  reconstructed  on  the  basis  of  these 
prototypes.  The  rater  may  observe  the  ratee  performing  a 
selected  number  of  behaviors  (e.g.  is  courteous  to  customers, 
is  on  time)  which  are  contained  within  the  category  of  "good 
worker".  Then,  instead  of  attending  to  the  ratee' s  other 
current  characteristics  or  behaviors,  the  rater  will 
attribute  the  other  characteristics  contained  within  the 
"good  worker"  prototype  to  that  ratee.  Therefore,  any  rating 
scale  is  suject  to  prototype-based  distortion  and  halo-error, 
and  purely  evaluative  responses  (global  ratings  of 
"goodness")  are  thought  to  be  based  on  stored  evaluative 
impressions  associated  with  these  prototypes.  Cooper  (1981) 
argues  from  a  similar  viewpoint  that  halo  error  in  ratings  is 
strongly  influenced  by  cognitive  distortions  based  on 
"illusory”  theories  that  raters  develop  about  how  behavioral 
dimensions  covary. 

STRESS  AND  THE  RATING  PROCESS 

It  is  evident  that  some  situations  are  more  likely  to 
result  in  categorization  and  the  use  of  limited  information 
in  making  performance  appraisals.  Cohen  (1981)  posits  that 
raters  under  time  constraints  are  more  likely  to  recall  only 
category-consistent  information,  as  this  requires  less 
cognitive  energy.  This  has  been  supported  by  research 
conducted  in  marketing  and  decision-making.  For  example, 
Staelin  and  Payne  (1976)  found  that,  when  facing  time 
pressure  and  distraction,  shoppers  tried  to  reduce  search 
time  by  collecting  fewer  peices  of  information  and  generally 
searching  for  negative  information.  Other  research  in 
decision-making  has  shown  that  judges  facing  time  pressures 
or  distractions  use  fewer  cues  in  making  decisions 
( Christensen-Szalanski ,  1980)  and  rely  more  heavily  on 
negative  information  (Wright,  1974),  as  negative  information 
is  seen  more  informative  than  positive  information. 

In  line  with  the  work  of  Staelin  and  Payne  (1976)  with 
pressured  shoppers,  and  that  of  Christensen-Szalanski  (1980) 
with  pressured  decision-makers,  Srinivas  and  Notowidlo  (1985) 
have  found  that  the  amount  of  stress  on  the  rater  can  affect 
the  performance  rating  process.  They  suggest  that  stress 
will  lead  a  rater  to  rely  on  simple  prototypes  rather  than 
actual  information  obtained  from  observing  the  performance  of 
ratees,  and  this  may  contribute  substantially  to  rating 
distortion . 

Although  there  is  no  universally  accepted 
conceptualization  or  definition  of  stress  (Alluisi,  1982; 
Schuler,  1980),  many  have  been  offered  (e.g.  Caplan,  Cobb, 
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French,  Van  Harrison,  £  Pinneau,  1975;  Janis  &  Leventhal, 
1968;  Lazarus,  Desse,  &  Osier,  1952;  Margolis  &  Kroes,  1974; 
McGarth,  1976).  Janis  and  Leventhal  (1968)  describe  stress 
as  an  unpleasant  emotional  state,  involving  negative 
affective  responses  such  as  anxiety,  irritation,  and 
depression.  Additionally,  stress  is  partially  due  to 
environmental  demands  which  threaten  to  exceed  the 
individual's  capabilities  and  resources  for  meeting  them 
(Caplan,  et  al.,  1975;  McGarth,  1976).  One  such 
environmental  situation,  work  overload,  was  used  by  Srinivas 
and  Motowildo  (1985)  in  their  study  on  stress  and  information 
processing. 

Cohen  (1980),  in  a  review  of  the  literature  on  effects 
of  stress  on  performance,  found  overwhelming  evidence  of  a 
post-stimulation  effect,  especially  when  the  stress  is 
unpredictable.  That  is,  the  psychological  state  produced  by 
stress  endures,  and  influences  behavior  even  after  the 
stressor  has  been  removed.  This  psychological  state  affects 
performance  on  subsequent  tasks,  particularly  those  tasks 
requiring  tolerance  for  frustration,  clerical  accuracy,  and 
the  ability  to  avoid  perceptual  distractions.  Effects  of 
stress  on  social  behavior  include  a  decrease  in  sensitivity 
to  others,  helping  behavior,  recognition  of  individual 
differences,  and  an  increase  in  aggression.  There  are  two 
explanations  of  these  post-stimulation  effects  of  stress  that 
may  be  considered  applicable  to  a  performance  appraisal 
situation-- the  psychic  cost  hypothesis  and  the  frustration- - 
mood  hypothesis  (Cohen,  1980;  Srinivas  £Motowidlo,  1985). 

The  Psychic  Cost  Hypothesis.  This  theory  has  to  do  with 
the  individual's  limited  attentional  capacity  (Miller,  1956), 
and  states  that  the  attentional  capacity  shrinks  when  there 
are  prolonged  demands  placed  on  it.  Therefore,  prolonged 
exposure  to  an  environmental  stressor  such  as  a  high 
information  rate  task  should  result  in  cognitive  fatigue,  or 
an  insufficient  reserve  of  attention  available  for  subsequent 
demanding  tasks. 

In  accordance  with  Kahneman's  effortful  attention  model 
(1973),  there  is  an  inverse  relationship  between  the  effort 
supplied  to  the  main  task  and  the  spare  capacity  or  effort 
available  for  processing  subsequent  tasks.  Other  researchers 
in  the  cognitive  area  (e.g.,  Eysenck,  1983;  Hasher  £  Zacks, 
1979)  have  shown  that  people  experiencing  high  levels  of 
stress  tend  to  rely  on  automatic  processing  operations  rather 
than  controlled  or  effortful  operations.  Automatic 
processing  proceeds  without  subject  control  or  intention, 
does  not  interfere  with  other,  ongoing  cognitive  activity, 
operates  at  a  constant  level  under  all  conditions,  and 
therfore  does  not  stress  the  capacity  limitations  of  the 
cognitive  system.  In  contrast,  controlled  processing 
operations  are  under  conscious  control  of  the  subject  and  are 
therefore  capacity-limited  and  drain  cognitive  energy  (Ha6her 
£  Zacks,  1979;  Posner  £  Snyder,  1975;  Schneider  £  Shriffin, 
1977).  Stress  (a  drain  on  cognitive  energy)  is  believed  to 
increase  a  person's  reliance  on  the  automatic  mode  of 


information  processing,  since  this  mode  requires  less 
cognitive  energy. 

A  possible  consequence  of  this  tendency  to  rely  on 
automatic  processing  in  a  performance  rating  situation  is 
that  stressed  individuals  will  rate  another's  performance 
based  on  an  initial  overall  impression  of  general 
effectiveness  (based  on  prototypic  attributions),  rather  than 
attend  to  specific  behavioral  dimensions,  since 
such  attention  would  require  a  controlled  processing  mode. 

In  summary,  the  psychic  cost  hypothesis  would  predict  that  a 
stressed  rater  will  provide  ratings  of  a  single  ratee  that 
show  less  variability  across  dimensions  (i.e.,  greater  halo 
error)  than  ratings  of  that  same  ratee  provided  by  a  rater 
who  has  not  been  stressed  (Srinivas  &  Notowidlo,  1985). 

The  Frustration-Mood  Hypothesis.  This  theory  (Cohen, 
1980)  states  that  exposure  to  stressors  influences  behavior, 
particularly  social  behaviors,  by  affecting  mood.  Stressed 
individuals  experience  feelings  of  frustration,  annoyance, 
and  irritation,  which  result  in  less  motivation  to  perform 
subsequent  tasks,  and  in  less  sensitivity  to  the  needs  of 
others.  Negative  mood  states  also  result  in  increased 
aggression  and  other  undesirable  interpersonal  behaviors. 

Mood  states  have  been  found  to  affect  information  processing 
in  two  ways:  by  influencing  the  type  of  information  attended 
to,  and  by  influencing  the  kind  of  information  retrieved  from 
memory. 

The  effects  of  emotional  states  on  the  kind  of 
information  attended  to  have  been  widely  examined.  It  is 
consistently  found  that  individuals  attend  to  material 
congruent  with  their  current  mood  (Bower,  1981;  Bower  & 

Cohen,  1982).  Thus  stressed  raters  are  more  likely  to  attend 
to  negative  information  about  the  ratee 's  performance. 

With  regard  to  the  effects  of  mood  on  information 
retrieval,  it  has  been  found  that  people  are  more  likely  to 
recall  information  that  is  congruent  with  their  mood  (Bower, 
1981;  Clark,  Milberg,  &  Ross,  1983;  Isen,  Shalker,  Clark,  & 
Karp,  1978).  Stressed  raters  experiencing  negative  mood 
states  are  likely  to  retrieve  or  recall  negative  information 
about  the  ratee's  performance,  and  will  form  less  favourable 
judgments  of  that  performance  (Srinivas  &  Motowidlo,  1985) 

In  addition  to  the  accuracy  of  stressed  individuals' 
ratings,  another  question  concerning  the  effect  of  stress  on 
the  cognitive  appraisal  process  is  at  what  stage  the  process 
is  affected  .  For  example,  DeNisi,  Cafferty,  and  Meglino 
(1984)  point  out  that  raters  facing  a  major  stressor  (time 
pressures)  will  seek  fewer  but  more  information  cues,  therby 
reducing  the  "marginal  cost"  of  gathering  information.  This 
implies  that  it  is  the  input  phase  of  processing  that  is 
affected.  This  type  of  narrowed  search  activity  might  also 
occur  during  retrieval  of  information.  If  stress  does 
influence  the  process,  the  impact  could  be  on  the  input  phase 
(observation  and  storage),  the  retrieval  phase,  or  both  of 
these . 
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Srinivas  and  Motowidlo  (1985)  attempted  to  answer  these 
questions  (the  psychic  cost  and  frustration-mood  predictions 
and  the  input  vs.  retrieval  questions).  In  a  simulated  work 
setting,  using  a  stressful  vs.  nonstressful  in-basket  task  as 
the  stress  manipulation,  and  order  of  information 
presentation  as  the  input  vs.  retrieval  manipulation,  they 
had  subjects  rate  the  videotaped  performance  of  a 
subordinate.  Subjects  who  were  stressed  prior  to  observing 
and  rating  the  performance  were  affected  by  that  stress 
during  the  input  phase  of  the  process,  whereas  subjects 
stressed  after  the  observation  but  prior  to  rating  the 
performance  were  affected  during  the  retrieval  phase. 
Dependent  variables  were  severity  and  dispersion  of  ratings 
across  performance  dimensions  (i.e.,  halo  error).  Ratings 
provided  by  stressed  subjects  showed  less  dispersion  across 
performance  dimensions  (i.e.,  more  halo),  but  no  difference 
in  favorability .  Furthermore,  the  effects  of  stress  on 
dispersion  were  significant  only  within  the  retrieval 
condition,  suggesting  that  stress  affects  the  retrieval  phase 
of  information  processing. 

Unfortunately,  there  is  a  major  confound  present  in 
their  design.  Half  of  the  subjects  (those  in  the  retrieval 
condition)  observed  the  performance,  then  performaed  a  45- 
minute  in-basket  and  several  other  questionnaire  tasks  before 
rating  the  performanance  they  had  observed.  The  time  between 
observation  and  rating  of  the  performance  in  this  condition 
was  at  least  one  hour.  The  subjects  in  the  input  condition, 
however,  performed  the  45-minute  in-basket  prior  to  observing 
the  performance,  and  then  rated  the  performance  immediately 
after  the  observation.  The  time  between  observation  and 
rating  for  this  group  was  less  than  10  minutes.  Since 
stressed  subjects  in  the  retrieval  (one  hour)  group  had 
significantly  less  dispersion  than  those  in  the  input 
(10  minute)  group,  it  is  possible  that  the  difference  in  the 
time  delay  for  the  two  groups  is,  at  least  in  part, 
responsible  for  this  finding. 

Hypotheses 

The  present  study  attempted  to  address  similar  issues  to 
those  investigated  by  Srinivas  and  Motowidlo  (1985),  with  an 
altered  experimental  design  to  correct  for  the  time  delay 
confound  in  their  study.  We  also  looked  at  the  effects  of 
stress  on  a  general  measure  of  rating  accuracy. 

Hypothesis  1:  The  psychic  cost  theory  predicts  that 
stressed  individuals  will  be  more  likely  to  operate  in  an 
automatic  mode  of  cognitive  processing  in  an  effort  to 
conserve  cognitive  energy.  They  will  be  more  likely  to  base 
evluative  judgments  of  others'  performance  on  overall 
impressions,  rather  than  attend  to  or  recall  specific  aspects 
of  that  performance.  Therefore,  a  main  effect  of  stress 
level  on  amount  of  halo  error  is  predicted.  Raters 
experiencing  higher  levels  of  stress  should  exhibit  a  greater 
amount  of  halo  error  (decreased  dispersion)  in  rating  the 
performance  of  subordinates. 
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Hypothesis  2:  The  frustration-mood  theory  maintains 
that  stressed  raters  will  experience  a  more  negative  mood 
state,  and  that  this  in  turn  will  color  judgments  about  the 
performance  of  others  in  a  negative  way.  This  may  be  due  to 
a  tendency  for  such  individuals  to  seek  out  and  attend  to 
negative  information,  or  to  simply  recall  more  negative 
information  (Bower,  1981).  Therfore,  it  is  expected  that 
raters  experiencing  higher  levels  of  stress  should  exhibit 
more  severity  in  rating  the  performance  of  subordinates. 

Hypothesis  3;  Implicit  in  Hypotheses  1  and  2  is  the 
idea  that  stress  affects  rating  accuracy.  Further,  both  the 
psychic  cost  and  frustration-mood  theories  suggest  that  under 
high  stress  conditions,  raters  do  not  use  all  information 
available  to  them  in  making  performance  ratings.  If  either 
the  psychic  cost  or  frustration-mood  principles  are  operating 
during  stressful  conditions,  it  is  expected  that  raters  would 
be  able  to  recall  significantly  less  information  about  the 
ratee's  performance  than  raters  observing  and  retrieving 
information  under  conditions  of  low  stress.  This  recall 
inhibition  under  stress  may  be  due  to  the  rater's  decreased 
attentional  capacities,  or  faulty  memory. 

Research  question:  The  effect  of  rater  stress  on  rating 
distortion  may  be  observation-based,  recall-based,  or  both. 

If  it  is  observation-based,  it  is  expected  that  the  input 
phase  will  be  more  greatly  affected  by  stress.  If  distortion 
due  to  stress  is  a  recall-based  phenomenon,  it  is  expected 
that  the  retrieval  phase  will  be  more  greatly  affected  by 
stress.  This  question  will  also  be  examined,  although  no 
specific  predictions  are  made. 

An  additional  question  concerns  the  role  of  rater 
cognitive  complexity  as  a  possible  moderator  of  the  effects 
of  rater  stress  on  rating  distortion.  Cognitive  complexity 
or  selectivity  is  defined  as  the  ability  to  differentially 
attend  to  multidimensional  stimuli  (Cardy  &  Kehoe,  1984). 
These  authors  have  found  that  raters  high  on  this 
characteristic  tend  to  provide  more  accurate  appraisal  than 
other  raters.  However,  Bernardin,  Cardy,  and  Carlyle  (1982) 
found  no  evidence  to  this  effect.  It  seems  reasonable  to 
hypothesize  that  cognitively  complex  raters  would  be  better 
able  to  process  information  even  if  stressed,  while 
cognitively  simple  raters  would  be  more  likely  to  fall  back 
on  simple  prototypes  when  stressed.  This  study  will  examine 
this  question  in  an  exploratory  fashion. 

METHOD 


Sample 

The  sample  consisted  of  84  (53  male  and  31  female) 
introductory  psychology  students  who  were  randomly  assigned 
to  one  of  four  experimental  conditions  in  a  2  x  2  design. 
Students  participated  in  order  to  fulfill  course 
requirements. 

Design 

Research  participants  were  told  that  the  experiment  was 
part  of  a  project  concerned  with  developing  exercises  for  a 
managerial  assessment  center.  They  would  assume  the  role  of 


a  sales  manager,  and  complete  an  in-basket  exercise  lasting 
35  minutes,  and  several  other  tasks,  including  a  performance 
rating.  The  study  was  a  2  x  2  design,  in  which  two  levels  of 
work  stress  (high  vs.  low)  were  crossed  with  the  timing  of 
the  stress  and  performance  information  presentation--during 
the  input  phase  (stress  level  introduced  before  performance 
observation  and  ratings)  or  in  the  retrieval  phase  (stress 
level  introduced  after  performance  observation  but  before 
ratings) . 

independent  Variables 

Stress  conditions.  Stress  was  manipulated  using  two 
versions  of  an  in-basket  exercise.  One  version  was  more 
difficult  and  required  greater  information  processing  (high 
stress)  than  the  other  version  (low  stress).  Participants 
completing  the  high  stress  in-basket  were  interrupted 
frequently  with  additional  information  presented  on  a 
videotape  depicting  visits  to  the  manager's  office  by  his  or 
her  superior,  subordinates,  and  others  who  provided 
additional  information  usually  concerned  with  the  in-basket 
materials.  The  interruptions  consisted  of  an  intercom  buzz, 
a  secretary  announcing  a  visitor,  the  visitor  entering  the 
"office",  then  presenting  the  information,  question,  etc. 
Messages  ranged  in  importance  from  those  concerning  major  and 
immediate  production  problems,  which  called  for  immediate 
attention,  to  office  gossip.  Interruptions  were  made  at 
variable  time  intervals  and  were  of  variable  durations 
throughout  the  in-basket  exercise,  in  this  high  stress 
group,  in-basket  materials  included  problems  concerning 
interdepartmental  conflicts,  production  delays,  supervisor- 
subordinate  problems,  and  general  information  about  routine 
operations.  In  addition,  these  subjects  were  told  that  they 
must  complete  the  exercise  in  35  minutes. 

Participants  in  the  low  stress  condition  completed  a 
less  complicated  version  of  the  in-basket,  consisting  of 
problems  that  were  more  routine.  No  interruptions  occured, 
and  participants  were  told  to  complete  as  much  of  the  in- 
basket  as  possible,  but  that  it  was  not  mandatory  that  they 
complete  all  of  it. 

Timing  of  stress  presentation  conditions.  The  timing  of 
the  stress  presentation  variable  was  manipulated  by 
presenting  half  the  participants  with  the  performance 
information,  via  videotape,  before  they  worked  on  the  in- 
basket  (stress  was  introduced  during  the  retrieval  phase)  and 
presenting  half  the  participants  with  the  performance 
videotape  after  they  had  completed  the  in-basket  (stress 
introduced  during  the  input  phase).  The  performance 
videotape  used  was  one  developed  and  scored  by  Borman,  et  al. 
(1976).  This  tape  depicts  a  manager  dealing  with  a 
subordinate  in  an  appraisal  interview.  The  tape  lasted  about 
8  minutes. 

Manipulation  Checks 

Manipulation  checks  for  stress  were  measures  of  pulse 
rate  immediately  after  the  in-basket  exercise,  and  subjective 
stress.  Subjective  stress  was  assessed  using  a  questionnaire 


containing  items  from  Srinivas  and  Notowidlo  (1985)  and  the 
Job-related  Tension  Scale  (Kahn,  Wolfe,  Quinn,  Snoek,  & 
Rosenthal,  1964).  Internal  consistency  was  .88  for  this 
sample.  Example  items  from  the  stress  questionnaire  are 
presented  in  Table  1.  For  items  in  Part  1,  subjects 
responded  using  a  5-point  "strongly  agree"  to  "strongly 
disagree"  scale.  The  items  were  scored  so  that  a  high  score 
indicated  a  high  level  of  subjective  stress.  For  items  in 
Part  2,  subjects  responded  using  a  5-point  "never"  to  "nearly 
all  the  time"  scale.  Items  from  Part  1  and  2  were  summed  to 
give  a  total  subjective  stress  score. 

Table  1. 


Example  Items  from  the  Subjective  Stress  Questionnaire 
Part  1: 

I  did  not  have  enough  time  to  complete  the  in-basket. 

I  felt  like  taking  a  break  while  working  on  the  in-basket. 


I  was  irritated  while  completing  the  in-basket. 


I  was  overwhelmed  by  all  the  information  that  was  present 
in  the  in-basket. 


I  was  tired  while  going  through  the  in-basket. 

I  felt  very  tense  while  going  through  the  in-basket. 


Part  2: 

How  frequently  during  the  in-basket  were  you  bothered  by; 


Being  unclear  on  just  what  the  scope  and  responsibilities 
of  your  task  were? 


Feeling  that  you  had  too  heavy  a  work  load  ,  one  that  you 
couldn't  possibly  finish  in  the  time  alloted? 


Feeling  that  you  weren't  capable  of  handling  the  job? 

Thinking  that  the  amount  of  work  you  had  to  do  was 
interfering  with  how  well  it  got  done? 


The  Multiple  Affect  Adjective  Checklist  (MAACL; 

Zuckerman  &  Lubin,  1965)  was  used  to  assess  mood  states.  This 
measure  asked  respondents  to  check  the  adjectives  that 
described  the  way  they  felt  "right  now".  Examples  of 
adjectives  on  the  MAACL  are:  calm,  angry,  disgusted, 
friendly,  blue,  and  happy.  The  MAACL  was  scored  for  anxiety, 
hostility,  and  depression.  The  internal  consistency 
reliabilities  for  these  scales  were  .68,  .63,  and  .69, 
respectively.  Participants  were  also  asked  directly  how  well 
they  believed  they  performed  on  the  in-basket  exercise.  It 
was  expected  that  those  subjects  in  the  high  stress 
conditions  would  be  more  likely  to  doubt  the  quality  of  their 
performance.  A  single  item  "How  well  do  you  think  you 
performed  on  tke  in-basket"  was  used  to  measure  self-rated 
performance.  A  5-point,  1  -  very  poorly  to  5  ■  very  well 
scale  was  used. 

Dependent  Variable 

The  performance  rating  scales  used  were  those  developed 
by  Borman,  et  al.  (1976)  to  accompany  the  performance 
videotape.  They  are  behaviorally  anchored  and  include  7 
different  dimensions  of  performance:  structuring  the 
interview,  establishing  and  maintaining  rapport,  reaction  to 
stress,  obtaining  information,  resolving  conflict,  developing 
the  subordinate,  and  motivating  the  subordinate.  Raters  were 
given  definitions  of  each  dimension  and  asked  to  rate  the 
manager  on  each  dimension  using  a  1  to  7  scale  (1  being 
lowest  performance). 

Three  measures  of  rating  distortion  were  collected. 

These  measures  were  halo  error,  severity  error,  and  rating 
accuracy.  Halo  error  was  defined  as  the  degree  of  dispersion 
exhibited  by  a  rater  across  performance  dimensions. 

Dispersion  for  each  rater  was  calculated  as  the  variance 
across  dimensions  of  the  rater's  ratings  (Borman,  1975).  A 
low  dispersion  score  indicated  halo  error.  Accuracy  was 
defined  as  the  sum  of  the  squared  differences  between  the 
rater's  ratings  of  the  subordinate  on  a  particular  dimension, 
and  the  subordinate's  true  score  on  that  dimension.  A  low 
difference  score  indicated  accurate  ratings.  True  scores  of 
the  stimulus  performance  were  developed  by  Borman  et  al. 
(1976)  and  were  defined  as  the  mean  expert  rating  given  on 
the  particular  dimension  by  the  industrial  experts  who  served 
as  judges  during  the  validation  of  the  accompanying  rating 
scales.  Severity  was  defined  as  the  extent  to  which  mean 
performance  rating  across  all  dimensions  differed  from  true 
mean  performance  across  all  dimensions.  A  positive  score 
indicated  leniency,  while  a  negative  score  indicated 
severity. 

In  addition  to  the  performance  ratings  described  above, 
all  participants  were  asked  to  complete  a  recognition  task, 
in  which  they  were  to  report  whether  they  remembered  specific 
aspects  of  the  ratee's  performance.  This  measure  consisted 
of  a  series  of  statements  containing  behavioral  descriptions 
of  the  ratee's  performance.  Respondents  were  asked  to 
indicate  whether  they  recalled  each  behavior  occurring  in  the 


videotape  by  checking  "yes"  or  "no"  for  that  item.  This 
series  of  statements  included  contrived  behaviors  as  well  as 
those  actually  performed  by  the  stimulus  person.  The 
recognition  score  was  simply  the  number  of  behavioral 
descriptions  correctly  recognized  with  a  high  score 
indicating  better  recognition. 

Participants  were  also  asked  to  complete  a  neutral  task 
lasting  45  minutes.  This  neutral  task  consisted  of  a  10- 
minute  water  break  and  completion  of  Bieri's  grid  form  of  the 
role  reperatory  test  (Bieri,  Briar,  Leaman,  Miller,  & 

Tripodi,  1966),  which  was  also  the  measure  of  cognitive 
complexity.  Its  internal  consistency  reliability  was  .78. 
This  measure  asked  respondents  to  list  the  person  they  knew 
who  best  fit  one  of  eight  different  definitions  (e.g., 

"member  of  the  opposite  sex  whom  you  admire  most,"  "member 
of  the  opposite  sex  whom  you  find  hard  to  like,").  Then, 
they  were  asked  to  rate  each  of  these  eight  people  on  a 
number  of  personality  traits.  The  measure  was  scored  for  the 
extent  to  which  respondents  differentially  rated  different 
people,  using  a  grid  system.  Lower  scores  on  this  measure 
indicated  greater  cognitive  complexity. 

Procedure 

All  participants  were  given  a  brief  introduction  to  the 
study,  including  a  cover  story,  and  asked  to  sign  a  statement 
of  informed  consent.  A  measure  of  their  pulse  was  taken. 

Retrieval  conditions.  Half  of  the  participants  in  the 
high  stress  group  and  half  in  the  low  stress  group  observed 
the  performance  videotape  prior  to  working  on  the  in-basket. 
Stress  was  introduced  to  these  individuals  during  the 
retrieval  phase  of  the  process.  Before  being  shown  the 
videotape,  however,  these  participants  completed  the  neutral 
task.  Another  pulse  measure  was  taken  following  completion 
of  the  neutral  task.  Next,  the  participants  worked  on  the 
in-basket  exercise  for  35  minutes.  Upon  completion  of  the 
exercise,  another  pulse  rate  was  taken,  and  subjects  filled 
out  the  short  form  of  the  MAACL  (Zuckerman  &  Lubin,  1965). 
They  also  completed  the  subjective  stress  questionnaire. 
Finally,  participants  rated  the  performance  of  the  manager  in 
the  performance  videotape  on  the  rating  scales,  and  completed 
the  recognition  task. 

Input  conditions.  The  other  half  of  the  two  stress 
groups  went  through  the  following  sequence  after  a  brief 
introduction  to  the  experiment:  pulse  rate,  in-basket 
exercise,  second  pulse  rate,  MAACL,  subjective  stress 
questionnaire,  performance  videotape  observation,  neutral 
task,  pulse  rate,  completion  of  the  rating  forms,  and  the 
recognition  task.  For  these  individuals,  stress  was 
introduced  during  the  input  phase  of  the  process. 

All  subjects  were  asked  not  to  smoke,  run,  or  drink 
caffeinated  beverages  during  the  water  break.  Addition  of 
the  45-minute  neutral  task  to  the  two  groups  makes  the  time 
interval  between  the  observation  and  rating  of  the 
performance  equal  for  both  retrieval  and  input  conditions. 
There  is  a  time  delay  of  about  50  minutes  for  all 


participants  (see  table  2).  All  participants  were  debriefed 
and  thanked  at  the  end  of  the  session. 

RESULTS 

The  means  and  standard  deviations  for  all  experimental 
measures  are  shown  in  Table  3.  Correlations  between  measures 
are  given  in  table  4. 

Manipulation  Checks 

A  multivariate  analysis  of  variance  was  conducted  on 
negative  mood  (anxiety,  hostility,  depression),  subjective 
stress,  post-in-basket  pulse  rate,  and  self-ratings  of  in- 
basket  performance,  to  insure  that  the  experimental  stress 
groups  differed  in  their  stress  levels.  The  main  effect  for 
stress  was  significant  as  expected  [ F( 6 , 76 )-12 . 50 ,  pc.0001]. 
The  means  for  each  variable  under  the  two  stress  conditions 
are  presented  in  Table  5.  Baseline  pulse  rate  measured 
before  the  experimental  manipulation  was  a  covariate  in  both 
the  MANOVA  and  subsequent  ANOVAs.  Univariate  ANOVAs 
indicated  that  stress  groups  differed  significantly  in 
anxiety  level  [F-8.21,  df-1,81,  p<.01],  hostility  [F-10.10, 
df-1,81,  p<.01],  subjective  stress  [F-33.78,  df-1,81,  p<.001] 
and  post  in-basket  pulse  rate  [F-40.91,  df-1,81,  p<.001]. 

The  stress  manipulation  did  not  produce  differences  in 
depression  level  or  self-ratings  of  performance.  High  stress 
groups  experienced  significantly  greater  subjective  stress, 
increased  pulse  rate,  anxiety,  and  hostility  as  predicted. 


Table  2 

Summary  of  the  Procedure  Sequence  for  Different 
Experimental  Conditions 


Retrieval  Condition  Time(Min)  Input  Condition  Time(Min) 


1. 

Introduction 

5 

1. 

Introduction 

2. 

Measurement  of 

2. 

Measurement  of 

pulse  rate 

2 

pulse  rate 

2 

3. 

Neutral  task 

35 

3. 

In-basket 

35 

4. 

Measurement  of 

4. 

Measurement  of 

pulse  rate 

2 

pulse  rate 

2 

5. 

Water  break 

10 

MAACL 

5 

6. 

Observation 

6. 

Subjective 

of  ratee 

10 

stress  measure 

5 

7. 

In-basket 

35 

7. 

Observation 

of  ratee 

10 

8. 

Measurement  of 

8. 

Water  break 

10 

pulse  rate 

2 

9. 

MAACL 

5 

9. 

Neutral  task 

35 

10. 

Subjective  stress 

10. 

Measurement  of 

measure 

5 

pulse  rate 

2 

11. 

Performance 

11. 

Performance 

rating 

15 

rating 

15 

12. 

Recognition  task 

5 

12. 

Recogniton  task 

5 

Total 


131 


131 


Table  3 

Means  and  Standard  Deviations  of  Experimental  Measures* 


Measure 

X 

SD 

Baseline  pulse 

72.57 

12.25 

Pot-in-basket  pulse 

72.31 

11.56 

Neutral  task  pulse 

68.01 

9.09 

In-basket  performance 
(self  rating) 

3.39 

0.97 

Anxiety 

8.26 

3.55 

Depression 

14.57 

4.61 

Hostility 

9.68 

3.25 

Cognitive  complexity 

73.83 

19.43 

Subjective  stress 

43.88 

10.02 

Recognition 

25.02 

3.46 

Severity 

.162 

0.85 

Halo 

1.31 

0.43 

Accuracy 

24.67 

10.91 

*Possible  subjective  stress  scores  ranged  from  16  to  80 
Cognitive  complexity  scores  ranged  from  39  to  144. 

(Lower  values  indicate  higher  complexity). 

Possible  recognition  scores  ranged  from  0  to  33. 
Severity  scores  ranged  from  -1.62  to  1.74. 

(Negative  value  indicate  severity). 

Halo  scores  ranged  from  0  to  2.25. 

(Lower  values  indicate  halo). 

Accuracy  scores  ranged  from  3.77  to  54.85. 

(Lower  scores  indicate  accuracy). 
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Table  4 

Pearson  Correlation  Coefficient  for  Experiaental  Measures 


1  2 

3 

4 

5  6  7 

8 

9 

1.  Anxiety  —  .72*** 

.70*** 

.04 

.64***  -.07  .27** 

.07 

.29** 

2.  Depression  - 

.67*** 

.08 

.41  -.08  .19 

.05 

.19 

3 .  Hostility 

— 

.07 

.51***  -.17  .09  - 

.04 

.19 

4.  Cognitive 

Coaplexity 

— 

.13  -.06  .09  - 

.04 

.32** 

5.  Subjective  stress 

-.17  .17 

.01 

.32** 

6.  Recognition 

---  -.01 

.11 

.22* 

7.  Severity 

— 

.12 

.18 

8.  Halo 

— 

.09 

9.  Accuracy 


Higher  scores  indicate  better  recall. 
Negative  scores  indicate  severity. 
Lower  scores  indicate  halo. 

Lower  scores  indicate  accuracy. 

***  p< . 001 
**  p<.01 

*  p< . 05 


Table  5 

Cell  Means  for  Manipulation  Check  Variables  under  High  and 
Low  Stress  Conditions 


variables 


High 


stress  Condition 
(N-42) 

X  (sd) 

9.3l  (3/73) 


Anxiety 
Depression 
Hostility 
Subjective  stress 
In-basket  pulse 
in-basket  perforaance 
(self-rating)  3.24  (0.65) 


14.79 

10.74 

49.29 

77.71 


(4.52) 
(2.91) 

(8.53) 
(10.36) 


Low 


Stress  Condition 

(N-42 ) 

7.1$ 


14.36 

8.62 

38.48 

66.90 


(3.05) 

(4.74) 

(3.25) 

(8.42) 

(10.16) 


3.55  (1.06) 
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Hypothesis  1 

A  two-way  analysis  of  variance  was  conducted  on  halo  in 
ratings,  with  stress  level  (high  vs.  low)  and  timing  of 
stress  presentation  (input  vs.  retrieval)  serving  as 
independent  variables.  The  cell  means  and  standard 
deviations  for  halo  are  presented  in  Table  6.  This  analysis 
revealed  no  significant  main  effects  of  stress  level  or 
timing  of  stress  presentation  on  halo.  No  significant 
interaction  was  observed.  It  appears  that  neither  stress  nor 
timing  of  the  stress  manipulation  had  any  impact  on  the  halo 
error  exhibited  by  raters. 

Hypothesis  2 

A  two-way  analysis  of  variance  was  conducted  on  the 
severity  of  ratings.  Cell  means  and  standard  deviations  for 
this  analysis  are  presented  in  Table  6.  Results  reveal  no 
significant  main  effects  of  stress  level  or  timing  of  stress 
presentation  on  severity.  No  significant  interaction  was 
found.  Neither  stress  level  nor  timing  of  stress  had  any 
significant  impact  on  the  severity  with  which  raters  assigned 
performance  ratings. 

Hypothesis  3 

A  two-way  analysis  of  variance  was  conducted  on  the 
accuracy  of  ratings.  Cell  means  and  standard  deviations  are 
presented  in  Table  6.  This  analysis  revealed  significant  main 
effects  of  stress  level  [P-6.99,  df-1,80,  p<.05]  and  timing 
of  stress  presentation  [P-5.36,  df-1,80,  p<.05j  on  rating 
accuracy.  No  significant  interaction  effect  was  found.  These 
results  indicate  that,  as  stress  level  increased,  rating 
accuracy  decreased,  regardless  of  when  that  stress  was 
introduced.  Rating  accuracy  also  was  significantly  lower  when 
stress  (either  high  or  low)  was  introduced  during  the 
retrieval  phase,  as  compared  to  the  input  phase. 

An  additional  two-way  ANOVA  was  conducted  on  information 
recognition,  with  stress  level  and  timing  of  stress 
manipulation  again  serving  as  independent  variables.  Cell 
means  and  standard  deviations  for  this  analysis  are  presented 
in  Table  6.  Results  indicate  a  significant  main  effect  of 
timing  of  stress  manipulation  on  recognition  [P-5.92, 
df-1,80,  p<.05)  and  a  significant  interaction  between  stress 
and  timing  [P-4.43,  df-1,80,  p<.05].  Raters  in  the  retrieval 
condition  correctly  recognized  significantly  less  performance 
information  than  those  in  the  input  condition.  In  addition, 
this  difference  was  much  greater  for  raters  in  the  low  stress 
conditions  (See  Table  6).  It  appears  that,  under  stressful 
conditions,  timing  does  not  have  a  substantial  effect  on 
information  recall,  but  under  conditions  of  low  stress, 
recall  is  much  worse  in  the  retrieval  condition.  The  high 
stress  somehow  lessened  the  effect  of  timing. 

The  question  concerning  rater  cognitive  complexity  was 
examined  by  comparing  the  correlations  between  cognitive 
complexity  and  each  of  the  rating  variables  (i.e.,  halo, 
severity,  accuracy  and  recognition)  under  different 
experimental  conditions.  These  correlations  are  presented  in 
Table  7.  The  correlation  between  cognitive  complexity  and 


rating  accuracy  was  significant  in  the  high  stress  condition 
(regardless  of  timing  of  stress  presentation),  and  in  the 
input  condition  (regardless  of  stress  level).  No  other 
correlations  reached  significance.  Cognitively  complex 
raters  were  more  accurate  than  cognitively  simple  raters, 
especially  under  high  stress  or  when  stressed  before  input. 
Correlations  were  transformed  to  Fisher's  Z  values,  and  a 
test  of  significance  of  the  difference  between  correlations 
in  each  stress  condition  and  each  timing  condition  was 
conducted.  There  were  marginally  significant  differences 
between  correlations  of  cognitive  complexity  and  halo, 
recognition,  and  accuracy  across  stress  levels  (p<.10). 


Table  6 

Cell  Means  for  Dependent  Variables  by  Stress 
and  Timing  of  Stress  Presentation  Conditions 


Variables 


High 

Input 

(N-21) 


Stress 

Retrieval 

(N-21) 


Low  Stress 


Input 

(N-21) 


X  (sd) 

Pavorability  .15  ( . d 5 ) 
Dispersion  1.39  (.44) 
Accuracy  26.22(11.74) 
Recogni tion25 .00  (4.17) 


X 

.tst 


m 


(1.05) 
1.21  (.39) 

29.06(11.72) 
24.76  (3.70) 


X 

.W 


111 


% 


1.29 

17.93 

26.81 


(.69) 
(  .48) 
(8.56) 
(2.58) 


Retrieval 

(N-21) 


X  (sd) 

-.«‘(7fe) 

1.34  (.38) 
25.48(8.64) 
23.52(2.50) 


Table  7 


Pearson  Correlation  Coefficient  between  Cognitive  Complexity 
and  Rating  variables  under  Different  Experimental  Conditions 


Variables 

High 

Stress 

Low 

Input 

Timing 

Retrieval 

(N-42) 

(N-42) 

(N-42) 

(N-42) 

r 

r 

r 

r 

Halo 

.TT 

-.TT 

-.W 

.71 

Pavorability 

.24 

.06 

.16 

.13 

Accuracy 

.44** 

.11 

.40** 

.29 

Recognition 

.09 

-.29 

.20 

.06 

**  p  <  .01 

*  p  <  .05 


DISCUSSION 

Hypothesis  1 

Stress  level  had  no  effect  on  the  halo  error  in  ratings. 
Halo  was  defined  as  the  variance  across  dimensions  of  the 


rater's  ratings.  This  finding  does  not  support  the  psychic- 
cost  theory--that  raters  will  rely  on  global  impressions  of 
overall  performance  when  rating  individual  aspects  of 
performance  under  stressful  conditions.  Halo  did  not 
correlate  significantly  with  any  of  the  other  experimental 
measures  It  may  be  that  halo  was  not  appropriate  measure  of 
distortion  under  these  conditions,  and  that  distortion  was 


manifested  in  some  other  way,  such  as  low  rating  accuracy. 
This  will  be  discussed  in  a  later  section. 


Hypothesis  2 

Raters  in  the  high  stress  conditions  did  not  give  more 
severe  ratings  than  raters  in  the  low  stress  groups,  as  was 
predicted  by  the  frustration-mood  hypothesis.  Although 
stressed  raters  did  report  feeling  more  anxiety  and 
hostility,  these  mood  states  did  not  seem  to  influence  the 
severity  with  which  they  rated  the  stimulus  person.  In  fact, 
anxiety  was  inversely  correlated  with  severity  (r-.27, 
p<.01),  such  that  severity  of  ratings  decreased  as  anxiety 
increased.  It  may  be  that  anxious  raters  sensed  this  mood 
state  in  themselves,  and  were  concerned  with  the  possibility 
of  it  affecting  their  ratings.  In  order  to  compensate  for 
this  and  prevent  it  from  happening,  they  may  have  given  more 
favourable  ratings  than  they  would  have  otherwise. 

Another  explanation  of  this  relationship  is  that 
supervisors  or  evaluators  who  have  no  prior  experience  with 
the  stimulus  task  tend  to  see  the  task  as  more  difficult  and 


to  be  more  tolerant  of  subordinate  poor  performance  (Mitchell 
«  Kalb,  1982).  These  raters  are  therefore  less  likely  to 
make  low  performance  ratings.  The  raters  in  this  study  were 
not  likely  to  have  had  any  experience  with  the  interviewer's 
task,  and  may  have  attributed  any  poor  performance  to  the 
difficulty  of  the  task,  not  to  any  ability  or  motivational 
deficits  of  the  ratee.  The  fact  that  more  anxious  raters  had 


just  completed  a  difficult  managerial  task  themselves  may 
have  enhanced  the  perception  of  the  ratee's  task  as 
difficult,  and  made  them  more  sympathetic  to  the  ratee's 
situation.  This  would  account  for  the  inverse  relationship 
between  anxiety  and  severity  error. 

Another  possible  explanation  for  the  lack  of  support 
found  for  the  frustration-mood  hypothesis  is  that  feelings  of 
depression  are  a  main  determinant  of  severity  error.  Since 
high  stress  conditions  did  not  lead  to  significantly  greater 
depression,  there  would  be  no  reason  to  expect  favorability 
of  ratings  to  differ  on  the  basis  of  stress  level.  The  fact 
that  depression  may  be  considered  a  "down"  state,  whereas 
anxiety  and  hostility  are  considered  "up"  or  aroused  states 
may  partially  account  for  the  lack  of  a  significant  effect  of 
stress  level  on  ratings.  Rater  mood  may  have  some  effect  on 
the  accuracy  of  performance  ratings,  but  not  necessarily  by 
way  of  the  mechanism  proposed  by  the  frustration-mood  theory. 

The  measure  used  to  assess  mood  state  may  also 
contribute  to  the  lack  of  effect  for  this  variable.  The 
MAACL  assesses  general  mood,  not  job- related  affect. 

Although  participants  were  asked  to  report  their  affect  in 
terms  of  how  they  felt  immediately  after  completing  the  in- 
basket,  it  is  possible  that  any  feelings  reported  could  have 
been  attributable  to  factors  other  than  the  stress 
manipulation.  Perhaps  a  more  appropriate  measure  of  mood 
state  would  be  one  that  is  more  specific  to  the  particular 
job  or  task. 

Hypothesis  3 

Stressed  raters  gave  significantly  less  accurate  ratings 
than  non-stressed  raters.  Accuracy  was  defined  as  the  sum  of 
the  squared  differences  between  the  rater's  ratings  and  the 
ratee's  true  scores,  such  that  a  small  value  indicates  high 
accuracy.  In  addition,  accuracy  was  significantly  correlated 
with  two  measures  of  stress--subjective  stress  (r-.32, 
p<.01),  and  anxiety  (r«.29,  p<.01),  indicating  inverse 
relationships  between  rating  accuracy  and  these  two  measures. 
These  results  lend  some  support  to  the  psychic  cost  theory. 

If  stressed  raters  have  less  cognitive  energy  to  devote  to 
the  rating  task,  they  may,  instead  of  relying  on  global 
impressions  of  performance,  simply  rate  in  a  rather  random, 
inaccurate  fashion,  without  attending  to  particular 
behavioral  dimensions.  This  does  not  imply  a  halo  effect. 
Another  explanation  of  the  effects  of  stress  on  accuracy  but 
not  halo  is  that  stressed  raters  may  remember  less  about  the 
performance,  and  therfore  give  inaccurate  evaluations.  The 
analysis  of  the  effect  of  stress  level  on  recognition  memory 
does  not  lend  support  to  this  notion,  however.  There  was  no 
main  effect  of  stress  level  on  recognition.  The  interaction 
results  show  that  the  decreased  recognition  of  raters  in  the 
retrieval  phase  was  actually  minimised  under  conditions  of 
high  stress.  Stress  acted  in  some  way  to  deter  the 
inhibition  of  memory  of  these  raters. 

The  raters'  inexperience  with  the  appraisal  task  may 


have  contributed  to  the  effect  of  stress  on  accuracy  but  not 
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halo.  Individuals  who  have  not  developed  a  schema  or 
prototype  of  the  "successful  manager”  in  the  particular 
situation  depicted  in  the  scenario  will  be  unable  to  fall 
back  on  any  schema  as  a  basis  for  evaluation.  Unable  to 
attribute  illusory  characteristics  of  that  schema  to  the 
ratee,  these  raters  will  not  form  global  impressions  of  the 
ratee,  and  therefore  will  not  exhibit  a  great  deal  of  halo 
error.  This  does  not,  however,  imply  that  their  ratings  will 
be  more  accurate.  The  student  raters  in  this  study  were  not 
likely  to  have  had  the  opportunity  (i.e.,  experience  with  the 
task)  to  develop  such  a  schema. 

The  question  of  when  stress  has  an  effect  on  rating 
accuracy  was  addressed  by  examining  main  effects  of  the 
timing  variable  on  dependent  measures,  and  the  stress  X 
timing  interactions.  The  only  significant  effects  were  on 
the  accuracy  measure.  The  raters  who  were  exposed  to  either 
stress  manipulation  during  the  retrieval  phase  of  the  process 
were  significantly  less  accurate  than  those  in  the  input 
conditions.  The  stress  X  timing  interaction  was  not 
significant  for  any  of  the  dependent  variables.  The  presence 
of  a  main  effect  of  timing,  with  no  stress  X  timing 
interaction  is  certainly  perplexing.  It  would  be  expected 
that  if  the  stress  manipulation  were  salient,  as  is  indicated 
by  the  MANOVA  and  ANOVA  results,  a  significant  interaction 
should  be  found.  One  explantion  is  that  raters  in  the  input 
conditions  were  given  a  10-minute  water  break  immediately 
following  the  observation  of  the  performance.  Since  there 
were  no  apparent  competing  demands  on  memory  during  the 
break,  there  was  an  opportunity  for  what  was  just  observed  to 
be  encoded  and  stored  in  memory  properly.  Raters  in  the 
retrieval  conditions,  however,  began  the  in-basket  soon  after 
observing  the  performance,  and  may  have  not  been  able  to 
store  performance  information  properly  when  under  these 
distracting  circumstances.  Even  though  raters  in  the  input 
conditions  were  stressed  prior  to  observing  the  performance, 
this  did  not  seem  to  have  as  strong  an  effect  on  rating 
accuracy  as  stress  during  retrieval  of  performance 
information.  This  supports  the  contentions  of  Ilgen  and 
Feldman  (1983)  that  the  rating  process  is  more  of  a  memory- 
based  phenomenon  than  a  stimulus-based  one,  although  the 
issue  concerning  competing  demands  during  information  storage 
(discussed  above)  should  be  kept  in  mind.  Further  support 
for  the  notion  of  rating  as  a  memory-based  process  is  given 
by  the  significant  correlation  between  recognition  and 
accuracy  (r»-.22,  p<.05).  Raters  who  correctly  recognized 
more  information  about  the  performance  were  more  accurate  in 
their  rating. 

A  significant  positive  correlation  between  cognitive 
complexity  and  accuracy  of  ratings  (r».44,  p<.0l)  indicates 
that  as  cognitive  complexity  increases,  accuracy  increases. 
This  supports  the  findings  of  Cardy  and  Kehoe  (1984).  This 
relationship  holds  true  only  under  conditions  of  high  stress 
or  input,  indicating  that  stressed  raters,  whose  attention  to 
the  stimulus  performance  is  hindered,  are  more  likely  to  make 
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accurate  judgments  if  they  are  better  able  to  differentiate 
between  multidimensional  stimuli  (i.e.,  are  more  cognitively 
complex).  It  appears  that  this  ability  may  help  raters  who 
are  distracted  during  observation  and  encoding  of  performance 
stimuli  overcome  this  distraction,  and  successfully  input  the 
necessary  information. 

Overall,  these  results  indicate  that  stress  does  lead  to 
the  distortion  of  performance  ratings,  but  only  when 
distortion  is  defined  in  terms  of  accuracy,  and  not  when 
defined  as  halo  or  severity  error.  Therfore,  the  psychic 
cost  hypothesis  is  supported,  but  some  redefinition  is 
needed.  This  theory  states  that  stress  will  drain  a  person's 
store  of  cognitive  energy,  therby  causing  him  or  her  to 
operate  in  an  automatic  mode  of  processing  on  subsequent 
tasks,  in  an  attempt  to  conserve  energy,  he  or  she  will  use 
a  global  impression  or  prototype  as  a  basis  for  judgments 
concerning  different  dimensions  of  another  person's 
performance.  This  will  result  in  halo  error.  There  was  no 
evidence  of  such  an  effect.  However,  there  is  evidence  that 
high  levels  of  stress  will  lead  to  a  loss  of  rating  accuracy, 
although  the  mechanism  by  which  this  occurs  is  unclear.  It 
may  be  a  function  of  random  assignment  of  ratings  (perhaps 
due  to  a  lack  of  familiarity  with  the  stimulus  task,  and 
therfore  a  lack  of  a  developed  prototype  or  standard  of 
comparison  for  it),  faulty  memory  of  the  rater,  or  both.  It 
may  be  difficult  to  store  and  therefore  remember  information 
that  is  not  categorised  within  a  well -developed  schema. 

These  findings  are  partially  supportive  of  Srinivas  and 
Motowidlo  (1985).  As  in  their  study,  severity  of  ratings  was 
not  significantly  affected  by  rater  stress.  Contrary  to 
their  results,  however,  high  stress  did  not  lead  to  greater 
halo  error.  This  may  be,  in  part,  due  to  the  correction  in 
the  present  study  of  the  time  delay  confound.  When  accuracy 
was  defined  as  the  amount  of  discrepancy  between  experimental 
ratings  and  the  ratee's  true  scores,  as  in  this  study,  rater 
stress  did  have  an  adverse  effect.  Such  a  measure  of 
accuracy  was  not  collected  in  the  earlier  study.  There  is 
partial  support  for  the  previous  findings  that  stress  has  its 
impact  during  the  retrieval  phase  of  the  appraisal  process, 
in  that  subjects  in  the  retrieval  conditions  exhibited 
significantly  lower  accuracy  in  ratings.  However,  the  stress 
X  timing  interaction  failed  to  reach  significance,  which 
indicates  that  the  previous  significant  interaction  may  have 
been  due  to  the  time  delay  confound  present  in  the  earlier 
design. 

CONCLUSIONS 

The  finding  that  stress  has  a  greater  effect  when 
presented  during  the  retrieval  phase  of  the  appraisal  process 
lends  further  support  to  the  previous  findings  (Heneman  & 
Wexley,  1983)  that  ratings  done  immediately  after  observation 
of  the  performance  are  more  accurate  than  ratings  done  after 
a  time  delay.  If  the  rater  stress-accuracy  relationship  is 
largely  a  function  of  memory,  ratings  should  be  given  as  soon 
after  observation  as  possible. 


In  teras  of  rater  stress  itself,  steps  sight  be  aade 
(either  on  the  individual  or  organisational  level)  to  reduce 
the  stress  experienced  by  supervisors  during  appraisal  time, 
in  order  to  allow  then  sore  time  and  energy  to  devote  to  the 
task.  This  sight  involve  a  reduced  workload  during  this 
tise,  stress  sanagesent  education,  and  tise  off  for  appraisal 
training  (or  retraining).  If  raters  can  be  taught  to 
recognise  their  personal  sysptoss  of  stress,  they  say  be  able 
to  anticipate  rating  inaccuracies,  and  prevent  thes.  Also, 
perforsance  appraisals  sight  be  scheduled  around  less 
stressful  tiaes  during  the  rater's  year.  Or  supervisors 
experiencing  great  asounts  of  stress  say  be  allowed  to 
postpone  their  appraisal  duties.  Whatever  the  sethod,  the 
alleviation  of  rater  stress  during  perforsance  appraisal 
should  have  positive  effects  on  the  accuracy  of  ratings,  and 
the  seaningfulness  of  the  appraisal  systes  on  the  whole. 
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